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Abstract 

Modern large displacement optical flow algorithms usu¬ 
ally use an initialization by either sparse descriptor match¬ 
ing techniques or dense approximate nearest neighbor 
flelds. While the latter have the advantage of being dense, 
they have the major disadvantage of being very outlier 
prone as they are not designed to find the optical flow, but 
the visually most similar correspondence. In this paper we 
present a dense correspondence field approach that is much 
less outlier prone and thus much better suited for optical 
flow estimation than approximate nearest neighbor flelds. 
Our approach is conceptually novel as it does not require 
explicit regularization, smoothing (like median filtering) or 
a new data term, but solely our novel purely data based 
search strategy that finds most inliers (even for small ob¬ 
jects), while it effectively avoids finding outliers. Moreover, 
we present novel enhancements for outlier filtering. We 
show that our approach is better suited for large displace¬ 
ment optical flow estimation than state-of-the-art descriptor 
matching techniques. We do so by initializing EpicFlow (so 
far the best method on MPI-Sintel) with our Flow Fields 
instead of their originally used state-of-the-art descriptor 
matching technique. We significantly outperform the origi¬ 
nal FpicFlow on MPI-Sintel, KITTI and Middlebury. 


1. Introduction 

Finding the correct dense optical flow between images 
or video frames is a challenging problem. While the visual 
similarity between two image regions is the most important 
clue for flnding the optical flow, it is often unreliable due to 
illumination changes, deformations, repetitive patterns, low 
texture, occlusions or blur. Hence, basically all dense opti¬ 
cal flow methods add prior knowledge about the properties 
of the flow, like local smoothness assumptions fTSl, struc¬ 
ture and motion adaptive assumptions |[30l , the assump¬ 
tion that motion discontinuities are more likely at image 
edges (261, the assumption that the optical flow can be ap- 



(c) Our outlier filtered Flow Field (d) Ground truth 


Figure 1. Comparison of state-of-the-art approximate nearest 
neighbor fields (a) and Flow Fields (b) with the same data term, 
a) and b) are shown with ground truth occlusion map (black pix¬ 
els). c) is after outlier filtering, occluded regions are successfully 
filtered. It can be used as initialization for an optical flow method. 


proximated by a few motion patterns 0. The most popular 
of these assumptions is the local smoothness assumption. It 
is usually incorporated into a joint energy based regulariza¬ 
tion that rates data consistency together with the smooth¬ 
ness in a variational setting of the flow CD. One major 
drawback of this setting is that fast minimization techniques 
usually rely on local linearization of the data term and thus 
can adapt the motion held only very locally. Hence, these 
methods have to use image pyramids to deal with fast mo¬ 
tions (large displacements) (61. In practice, this fails in 
cases where the determined motion on a lower scale is not 
very close to the correct motion of a higher scale. 

In contrast, for purely data based techniques like approx¬ 
imate nearest neighbor flelds ca (ANNF) and sparse de¬ 
scriptor matches (3^ there are fast approaches that can ef- 
flciently perform a global search for the best match on the 
full image resolution. However, as there is no regulariza¬ 
tion, (approximate) nearest neighbor flelds (NNF) usually 
contain many outliers that are difficult to identify. Further¬ 
more, even if outliers can be identifled they leave gaps in the 
motion held that must be fllled. Sparse descriptor matches 
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usually contain fewer outliers as matches are only deter¬ 
mined for carefully selected points with high confidence. 
However, due to their sparsity the gaps between matches are 
usually even larger than in outlier filtered ANNF. Gaps can 
be problematic, since a motion for which no match is found 
cannot be considered. Despite these difficulties, ANNF and 
sparse descriptor matches gained a lot of popularity in the 
last years as initial step of large displacement optical fiow 
algorithms. Nowadays, nearly all top-performing methods 
on challenging datasets like MPI-Sintel (H rely on such 
techniques. However, while there are descriptor matching 
approaches like Deep Matching that are tailored for 
optical fiow, dense initialization is usually simply based on 
ANNF - which is suboptimal. The intention behind ANNF 
is to find the visually closest match (NNF), which is often 
not identical with the optical flow. An important difference 
is that NNF are known to be very noisy regarding the offset 
of neighboring pixels, while optical flow is usually locally 
smooth and occasionally abrupt (see Figure [^. 

In this paper we show that it is possible to utilize this 
fact, to create dense correspondence fields that contain sig¬ 
nificantly fewer outliers than ANNF regarding optical fiow 
estimation - not because of explicit regularization, smooth¬ 
ing (like median filtering) or a different data term, but solely 
because of our novel search strategy that finds most inliers 
while it effectively avoids finding outliers. We call them 
Flow Fields as they are tailored for optical fiow estimation, 
while they are at the same time dense and purely data term 
based like ANNF. Flow Fields are conceptually novel as we 
avoid building on the popular, but for optical flow estima¬ 
tion inappropriate, (A)NNF concept. Our contributions are: 

• A novel hierarchical correspondence field search strat¬ 
egy that features powerful non-locality in the im¬ 
age space (see Figure a)), but locality in the fiow 
space (for smoothness) and can utilize hierarchy lev¬ 
els (scales) as effective outlier sieves. It allows to ob¬ 
tain better results with hierarchies/scales than without, 
even for tiny objects and other details. 

• We extend the common forward backward consistency 
check by a novel two way consistency check as well as 
region and density based outlier filtering. 

• We show the effectiveness of our approach by clearly 
outperforming ANNF and by obtaining the best result 
on MPI-Sintel IH and the second best on KITTI llT3]| . 

2. Related Work 

Dense optical fiow research started more than 30 years 
ago with the work of Horn and Schunck ca. We refer 
to publications like O |27l [29l for a detailed overview of 
optical fiow methods and the general principles behind it. 

One of the first works that integrated sparse descriptor 
matching for improved large displacement performance was 


Brox and Malik Since then, several works followed the 
idea of using sparse descriptors l[34l[32j|20l|28l[26]|. Only 
few works used dense ANNF instead Ham. Chen et al. 191 
showed that remarkable results can be achieved on the Mid- 
dlebury evaluation portal by extracting the dominant motion 
patterns from ANNF. Revaud et al. 1261 compared ANNF to 
Deep Matching for the initialization of their approach. 
They found that Deep Matching clearly outperforms ANNF. 
We will use their approach for optical fiow estimation and 
show that this is not the case for our Flow Fields. 

An important milestone regarding fast ANNF estimation 
was Patchmatch (H. Nowadays, there are even faster ANNF 
approaches EDO. There are also approaches that try to 
obtain correspondence fields tailored to optical fiow. Lu 
et al. 1231 used superpixels to gain edge aware correspon¬ 
dence fields. Bao et al. lO used an edge aware bilateral data 
term instead. While the edge aware data term helps them 
to obtain good results - especially at motion boundaries, 
their approach is still based on the ANNF strategy to deter¬ 
mine correspondences, although it is unfavorable for optical 
flow. HaCohen et al. ca presented a hierarchical corre¬ 
spondence field approach for image enhancement. While it 
does well in removing outliers, it also removes inliers that 
are not supported by a big neighborhood (in each scale). 
Such inliers are especially important for optical fiow as they 
cannot be determined by the classical coarse to fine strategy. 
Our approach cannot only preserve such isolated inliers, but 
can also spread them if needed (Figurea)). 

A technique that shares the idea of preferring locality 
(to avoid outliers) with our approach is region growing in 
3D reconstruction 01 [m. It is usually computationally 
expensive. A faster GPU parallelizable alternative based on 
PatchMatch was presented in our previous work □. It 
shares some ideas with our basic approach in Section [Ml 
but was not designed for optical flow estimation and lacks 
many important aspects of our approach in this paper. 

3. Our Approach 

In this section we detail our Flow Field approach, our 
extended outlier filter and the data terms used in the tests 
of our paper. Flow Fields are described in two steps. First 
we describe a basic (non-hierarchical) Flow Field approach. 
Afterwards, we build our full (hierarchical) Flow Field ap¬ 
proach on top of it. Given two images h^h C we 
use the following notation: Pr{pi) is an image patch with 
patch radius r centered at a pixel position pi = {x,y)i G 
li i = 1,2. The total size of our rectangular patch is 
(2r + 1) X (2r + 1) pixels. Our goal is to determine the 
optical fiow field of Ii with respect to I 2 i.e. the displace¬ 
ment field for all pixels pi G /i, denoted by F{pi) = 
M(pi) — Pi G M? for each pixel pi. M{pi) is the corre¬ 
sponding matching position p 2 G I 2 for a position pi G /i. 
All parameters mentioned below are assigned in Section]^ 



Figure 2. The pipeline of our Flow Field approach. For the basic 
approach we only consider the full resolution. 


3.1. Basic Flow Fields 




Figure 3. a) Example for the ability of propagation to propagate 
into different directions within a 90 degree angle. Gray pixels re¬ 
ject the flow of the green seed pixel. In practice each pixel is a 
seed, b) Pixel positions of Pi (green), Pf (blue) and Pi (red). 
The central pixel is in black, c) Our propagation directions. 


The first step of our basic approach is similar to the kd- 
tree based initialization step of the ANNF approach of He 
and Sun lUfil . We do not use any other step of |[T^ as we 
have found them to be harmful for optical fiow estimation, 
since they introduce resistant outliers whose matching er¬ 
rors are below those of the ground truth. Once introduced, 
a purely data based approach without regularization cannot 
remove them anymore. The secret is to avoid finding themj^ 

Our approach, outlined in Figure works as fol¬ 
lows: First we calculate the Walsh-Hadamard Transform 
(WHT) ifTTIl for all patches Pr{p2) centered at all pixel po¬ 
sitions p 2 in image I 2 similar to lITfil In contrast to them 
we use the first 9 bases for all three color channels in the 
CIELab color space. The resulting 27 dimensional vectors 
for each pixel are then sorted into a kd-tree with leaf size 
L We also split the tree in the dimension of the maximal 
spread by the median value. After building the kd-tree we 
create WHT vectors for all patches P^ {pi ) at all pixel posi¬ 
tions in image /i as well and search the corresponding leaf 
within the kd-tree (where it would belong to if we would 
add it to the tree). All I entries L in the leaf found by the 
vector of the patch Pr {pi ) are considered as candidates for 
the initial Flow Field F{pi). To determine which of them 
is the best we calculate their matching errors Ed with a ro¬ 
bust data term d and only keep the candidate with the lowest 
matching error in the initial Flow Field, i.e. 

F{pi) = argminp2eL{Ed{Pr{pi),Pr{P2))) - Pi ( 1 ) 

This is similar to reranking in US). We call points in the 
initial Flow Field arising directly from the kd-tree seeds. 
Farger I increase the probability that both correct seeds and 
resistant outliers are found. However, if both are found at a 
position the resistant outlier prevails. Thus, it is advisable 
to keep I small and to utilize the local smoothness of opti¬ 
cal fiow to propagate rare correct seeds in the initial Flow 
Field into many surrounding pixels - outliers usually fail in 
this regard as their surrounding does not form a smooth sur¬ 
face. The propagation of our initial fiow values works sim¬ 
ilar to the propagation step in the PatchMatch approach (41 
i.e. fiow values are propagated from position [x^y — l)i 

^ ANNF try to reproduce the NNF that contains all resistant outliers. 

^ For WHTs patches must be split in the middle. We found that quality 
does not suffer from spiting uneven patches (2r +1) into sizes r and r +1. 


and (x — 1, ^)i to positional = {x^y)i as follows: 

F{pi) = arg minp^(zGdEd{Pr{pi), Pr{P2))) - Pi 
Gi = {F{pi),F{{x,y - l)i),F{{x - l,y)i)}+pi 

( 2 ) 

Gi are the considered flows for our first propagation step. It 
is important to process positions (x, ^ — l)i and {x — l,y)i 
with Equation [^before position {x^y)i is processed. This 
allows the propagation approach to propagate into arbitrary 
directions within a 90 degree angle (see Figure a)). As 
optical fiow varies between neighboring pixels, but propa¬ 
gation can only propagate existing fiow values our next step 
is a random search step. Here, we modify the fiow of each 
pixel Pi by a random uniformly distributed offset Omd of at 
most R pixels. If the matching error E decreases we replace 
the fiow E by the new fiow E -b Omd- Omd is a subpixel 
accurate offset which leads to subpixel accurate positions 
M{pi). The pixel colors of M{pi) and Pr{M{pi)) are de¬ 
termined by bilinear interpolation. Early subpixel accuracy 
not only improves accuracy, but also helps to avoid outliers 
as subpixel accurate matches have a smaller matching error. 

In total we perform alternately 4 propagation and 3 ran¬ 
dom search steps (all with the same R) as shown in Fig¬ 
ure While the first propagation step is performed to the 
right and bottom, the subsequent three propagation steps are 
performed into the directions shown in Figure |^c). Many 
approaches that perform propagation (e.g. C6)) do not con¬ 
sider different propagation directions. Even the original 
PatchMatch approach only considers the first two direc¬ 
tions. While these already include all 4 main directions, 
we have to consider that propagation actually can propagate 
into all directions within a quadrant (see Figure a)) and 
that there are 4 quadrants in the full 360 degree range. 

Extensive propagation with random search is important 
to distribute rare correct seeds into the whole Flow Field. 
The locality of single propagation steps and random search 
(with small R) effectively prevents the Flow Field from in¬ 
troducing new outliers not existing in the initial Flow Field. 

3.2. Flow Fields 

Our basic Flow Fields still contain many resistant out¬ 
liers arising from kd-tree initialization. We can further re- 
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Figure 4. Illustration of our hierarchical Flow Field approach. 
Flow offsets saved in pixels are propagated in all arrow directions. 


duce their amount (and the amount of inliers) by not deter¬ 
mining an initial flow value for each pixel. This helps as 
inliers usually propagate much further than outliers (optical 
flow is smooth, outliers are usually not). However, to cover 
the larger flow variations between fewer inliers (that are fur¬ 
ther apart from each other) the random search distance R 
must be increased, which raises the danger of adding close 
by resistant outliers. A way to avoid this is to increase r, as 
well. This helps e.g. in the presence of repetitive patterns 
or poorly textured regions, but also signiflcantly increases 
computation time and creates new failure cases e.g. close to 
motion discontinuities and for small objects. Furthermore, 
a larger r (and R) leads to less accurate matches. 

We found a powerful solution (outlined in Figure]^ that 
avoids most of the disadvantages of large patches while be¬ 
ing even more robust: First we deflne that P^iPi) is a sub¬ 
sampled patch at pixel position pi with patch radius r * n 
that consists of only each nth pixel within its radius includ¬ 
ing the center pixel i.e. (see Figure |^b) for an illustration): 




I (x* — x) I mod n = 0 
|(^* -y)\ modn = 0 


(3) 


The pixel colors for P^{pi) are not determined from image 
li, but from a smoothed version of li that we call I^. This 
is similar to using image pyramids and using Pr on a higher 
pyramid level. The difference is that If' has the full image 
resolution and that pi is an actual pixel position on the full 
resolution, which effectively prevents upsampling errors. 

As If only has to be calculated once we can afford to use 
an expensive low pass Alter without noticeable difference in 
overall processing speed. In practice, we downsample li 
by a factor of n with area based downsampling, before up- 
sampling it again with Lanczos interpolation DU to obtain 
If. We always start with n = 2^. Our full Flow Field ap¬ 
proach first initializes only each nth pixels pf = (x^, ^n)i 
with Xn mod n = 0 and i/n mod n = 0 (see Figure]^. 
Initialization is performed similar to the basic approach: 

F(p") = argminp,eL{E4P:^{p^),P:}{p2))) (4) 

Note that the kd-tree samples L are identical to those of 



Figure 5. a) Flow Field obtained with /c = 3 with b) as only 
initialization (black pixels in b) are set to infinity). It shows the 
powerfulness of our hierarchical propagation, c) Like a) but with 
kd-tree initialization. The 3 marked details are preserved due to 
their availability in the coarse level d). e) like c) but without hi¬ 
erarchies (basic approach). Details are not preserved, f) ground 
truth. Note: As correspondence estimation is impossible in oc¬ 
cluded areas and as orientation we blacked such areas out. 


the basic approach. We still use non-subsampled patches 
Pr{Pi) for the WHT vectors for an accurate initialization. 

After initialization we perform propagation and random 
search similar to the basic approach. Except that we only 
propagate between points pf i.e. {xn - n,yn)i, {xn,yn - 
n)i {xn^yn)i otc. (see Figure]^ and that we use 
Rn = i? * n as maximum random search distance. Af¬ 
ter determining F{pi) using patches P^, we determine 
F{p'^),m = in the same way using patches P^. 

Hereby, the samples F{pi) are used as seeds instead of kd- 
tree samples. Positions pf^ that are not part of pf receive 
an initial flow value in the first propagation step of the hier¬ 
archy level k — 1. This approach is repeated up to the full 
resolution F{pl) = F{pi) (see Figure]^ and |^. 

Propagation and random search (with small enough R) 
are usually too local in flow space to introduce new out¬ 
liers, while propagations of lower hierarchy levels are likely 
to remove most outliers persisting in higher levels, since re¬ 
sistant outliers are often not resistant on all levels. Thus, 
hierarchy levels serve as effective outlier sieves (see videos 
in supplementary material). Also, matching patches Pf 
is mostly signiflcantly more robust than matching patches 
if r is sufficiently large. Deformations affect smoothed 
patches e.g. less, as smoothing allows more matching in¬ 
accuracy for a good match. Still, we obtain accurate flow 
values as we are iteratively increasing the resolution. 

In contrast to ordinary multi scale approaches, our hi¬ 
erarchical approach is non-local in the image space. Fig- 
urej^a) demonstrates how powerful this non-locality is. The 
Flow Field is only initialized by two flow values with a flow 
offset of 52 pixels to each other (Figure |^b)). This is more 


































than the random search step of all hierarchy levels together 
can traverse. Thus, the orange flow is a propagation barrier 
for the violet flow (Like gray pixels in Figure a)). Any¬ 
how, our approach manages to distribute the violet flow and 
similar flows determined by random search throughout the 
whole image. We originally performed the experiment to 
prove that the flow can be propagated into the arms starting 
from the body, but our approach even can obtain the flow 
for nearly the whole image with such poor initialization. 

Figure [^c) shows that we can even And tiny objects with 
our hierarchical approach: The 3 marked objects are well 
persevered in c) due to their availability in the coarse level 
d). Remarkably, these objects are only preserved when us¬ 
ing hierarchical matching. Our basic approach without hier¬ 
archies only preserves parts of the upper object (a butterfly) 
riddled with outliers, although its seeds are a superset of the 
seed of the hierarchical approach - but it fails in avoiding 
resistant outliers. Our hierarchical approach preserves tiny 
objects due to unsealed WHTs (initialization) and since the 
image gradients around tiny objects create local minima in 
Ed, even for huge patches P^. This is sufficient as lower 
minima (resistant outliers) are successfully avoided by our 
search strategy. Our visual tests showed that our approach 
with /c = 3 in general preserves tiny objects and other de¬ 
tails better than our basic approach. With too large k (> 3) 
tiny objects are due to lack of seeds not that well preserved. 

3.3. Data Terms 


In our paper we consider two data terms: 


1. Census transform |[36l . It is computationally cheap, il¬ 
lumination resistant and to some extend edge aware. 
We use the sum of census transform errors over all 
color channels in the CIELab color space for Ed. 


2. Patch based SIFT flow 1^ . A SIFT flow pixel usually 
has S = 3 channels. The colors are determined by first 
calculating the 128 dimensional SIFT vector for each 
pixel and then reducing it by PC A to S dimensions. 
The error between Sift Flow colors is determined by 
the 1/2 distance. For the images we found it advan¬ 
tageous to smooth the Sift Flow images as described in 
Section [T2| and to not use larger SIFT features instead. 
WHTs are still calculated in the CIELab color space. 


is not fulfilled for one of the two backward flows E^. For a 
3 way check an additional forward flow could be added, but 
for a 2 way check an extra backward flow performs better 
(see supplementary material for an explanation). 

After the consistency check many of the remaining out¬ 
liers form small regions that were originally connected to 
removed outliers. Thus, we remove these regions as fol¬ 
lows: First, we segment the partly outlier Altered Flow Field 
into regions. Neighboring pixels belong to the same region 
if the difference between their flow is below 3 pixels]^ Then, 
we test for regions with less than s pixels if it is possible for 
that region to add at least one outlier that was removed by 
the consistency check with the same rule. If this is possible, 
we found a small region that was originally connected to an 
outlier and we remove all points in that region. 

3.5. Sparsification and Dense Optical Flow 

To All the gaps created by outlier Altering we use the 
edge preserving interpolation approach proposed by Revaud 
et al. ll26ll (EpicFlow). We found that EpicFlow does not 
work very well with too dense samples. Thus, we select 
one sample in each 3x3 region in the outlier Altered Flow 
Field if the region still contains at least e samples. This is 
our last consistency check. We found that even after region 
based Altering most remaining outliers are in sparse regions 
where most flow values were removed. The sample that is 
selected is the sample for which the sum of both forward 
backward consistency check errors is the smallest. 

4. Evaluation 

We evaluate our approach on 3 optical flow datasets: 

• MPI-Sintel m It is based on an animated movie 
and contains many large motions up to 400 pixels per 
frame. The test set consists of two versions: clean 
and final. Clean contains realistic illuminations and 
reflections. Final additionally adds rendering effects 
like motion, defocus blurs and atmospheric effects. 

• Middlebury (H: It was created for accurate optical 
flow estimation with relatively small displacements. 
Most approaches can obtain an endpoint error (EPF) 
in the subpixel range. 


3.4. Outlier Filtering 

A common approach of outlier Altering is to perform a 
forward backward consistency check. We found that the 
robustness can be improved by a consistency check between 
two Flow Fields with different patch radii, as outliers often 
diverge into different directions. Practically, we calculate 
a backward flow for two patch radii r and r 2 and delete a 
pixel if it is not consistent to both backward flows i.e. if 

\Fipi) + Fj{pi +F(pi))| < e,j e 1,2 (5) 


• KITTI |[T3l : It was created from a platform on a driv¬ 
ing car and contains images of city streets. The mo¬ 
tions can become large when the car is driving. 

In Section [4T] we perform experiments to analyze our ap¬ 
proach and compare it to ANNF. In Section [4^ we present 
our results in the public evaluation portals of the introduced 
datasets. For simplicity, we use k = 3 and R = 1 which we 

^ Only the flow differences between neighboring pixels count. The flow 
values of a region can vary by an arbitrary offset. 






Figure 6. The left 4 columns show example results. Images is the average of both input images. For ANNF we use GH in a fair way (see 
text). FF means Flow Fields. OM means that the ground truth occlusion map is added (black pixels, it is incomplete at image boundaries). 
Filtered FF is after outlier filtering (deleted pixels in black). FF+Epic is EpicFlow applied on our Flow Fields. EpicFlow is the original 
EpicFlow. Right column: a) Our approach fails in the face of the right person (outlier) and at its back (blue samples too far right). Still our 
EPE is smaller due to more preserved details, b) The marked bright green fiow is not considered due to too strong outlier filtering. This 
makes a huge difference here, c) We show that our Elow Eields (bottom left) perform much better in presence of blur than ANNE (top left). 


have found to perform well based on a few incoherent tests 
(and Table[^and[^for k), / = 8 equivalent to |[T6]| and r = 8 
and r 2 = 6 as runtime tradeoffs for the census transform. 
Only e (±1), e (±1), 5 (±50) and r = rs ± 1 for SIFT flow 
were tuned coherently on all training frames, e, e and s are 
set to 5, 4 and 50 for MPI-Sintel, to 1, 7 and 50 for Mid- 
dleburry and to 1, 9 and 150 for KITTI, respectively. If not 
mentioned differently we use the census transform as data 
term. For EpicFlow applied on Flow Fields we use their 
standard parameters which are tuned for their Deep Match¬ 
ing features 13^ . For a fair comparison we use the same 
parameters (tuning e, e, s for ANNF does not affect our re¬ 
sults), data term and WHTs in CIEFab space for the ANNF 
approach ca (the original approach performs even worse). 
This includes ANNF results in Section [±T] and in Figure 
and More details regarding parameter selection and more 


experiments can be found in our supplementary material. 

Visual results are shown in Figure EpicFlow can pre¬ 
serve considerably more details with our Flow Fields than 
with the original Deep Matching features. Even in failure 
cases like in Figurea) (right column), our approach often 
still achieves a smaller EPE thanks to more preserved de¬ 
tails. Note that the shown failure cases also happen to the 
original EpicFlow. Despite more details our approach in 
general does not incorporate more outliers. The occasional 
removal of important details like the one marked in Figure 
b) remains an issue - even for our improved outlier Altering 
approach. The marked detail is important as the flow of 
the very fast moving object is different on the left (brighter 
green). Still, we can in general preserve more details than 
the original EpicFlow. Figure |^c) shows that our approach 
also performs well in presence of motion and defocus blur. 





















4.1. Experiments 

In the introduction we claimed that our Flow Fields are 
better suited for optical flow estimation than ANNF and 
contain signiflcantly fewer outliers. To prove our statement 
quantitatively we compare our Flow Fields with different 
number of hierarchy levels k to the state-of-the-art ANNF 
approach presented in csi. We also compare to the real 
NNF calculated in several days on the GPU. The compari¬ 
son is performed in Tablewith 4 different measures: 


Method 

< 3 pixel 

EPEIO 

EPE 

Epic 

k = 3+median 

92.17% 

0.91 

4.41 

2.13 

k = 3 

89.20% 

1.30 

6.04 

2.04 

k = 2 

88.79% 

1.36 

8.84 

2.08 

k = 1 

86.88% 

1.57 

14.65 

2.27 

k = 0 

79.13% 

2.29 

32.51 

2.81 

ANNF ITU 

68.05 % 

3.38 

59.11 

3.41 

NNF 

60.20 % 

4.18 

110.30 


Original EpicFlow 

- 

2.48 


Table 1. Comparison of different correspondence fields on a rep¬ 
resentative subset (2x every 10th frame) on non-occluded regions 
of the MPI-Sintel training set {clean mid final). See text for details. 


• The percentage of flows with an EPE below 3 pixels. 


• The EPE bounded to a maximum of 10 pixels for each 
flow value (EPE 10). Outliers in correspondence flelds 
can have arbitrary offsets, but the difficulty to remove 
them does not scale with their EPE. Local outliers can 
even be more harmful since they are more likely to 
pass the consistency check. The EPE 10 considers this. 


• The real endpoint error (EPE) of the raw correspon¬ 
dence flelds. It has to be taken with care (see EPE 10). 


The EPE after outlier Altering (like in Section 3.4) and 
utilizing EpicElow to All the gaps (Epic). 


All 4 measures are determined in non-occluded areas only, 
as it is impossible to determine data based correspondences 
in occluded areas. As can be seen, we can determine nearly 
90% of the pixels on the challenging MPI-Sintel training 
dataset with an EPE below 3 pixels, relying on a purely data 
based search strategy which considers each position in the 
image as possible correspondence. With weighted median 
Altering (weighted by matching error) this number can even 
be improved further, but the distribution is unfavorable for 
EpicElow (it probably removes important details similar to 
some regularization methods). In contrast, more hierarchy 
levels up to the tested k = 3 have a positive effect on the 
EPE as they successfully can provide the required details. 

Bao et al. ID also used hierarchical matching in their ap¬ 
proach to speed it up. However, despite joined bilateral up- 
sampling combined with local patch matching in a 3x3 win¬ 
dow they found that the quality on Middlebury drops clearly 
due to hierarchical matching. As can be seen in Tablethis 
is not the case for our approach. As expected from the ex¬ 
periment in Eigurej^the quality even rises. Note that the 
Epic result does not rise much as EpicElow is not designed 
for datasets like Middlebury with EPEs in the subpixel area. 
Even with the ground truth it does not perform much better 
than with our approach. Our upsampling strategy requires 
11 patch comparisons while El requires 9 comparisons and 
joined bilateral upsampling. However, in contrast to their 
upsampling strategy ours is non-local which means that we 
can easily correct inaccuracies and errors from a coarser 
level (the non-locality is demonstrated in Eigurej^a)). 


Method 

< 1 pixel 

EPE3 

EPE 

Epic 

Ground truth 

100% 

0.0 

0.0 

0.214 

k = 3 

87.08 % 

0.499 

1.16 

0.239 

k = 2 

86.81% 

0.508 

2.32 

0.240 

k = 0 

81.93% 

0.670 

12.33 

0.240 

Original EpicElow 

- 

0.380 


Table 2. Comparison of our approach with different hierarchy lev¬ 
els on the Middlebury training dataset to demonstrate that the qual¬ 
ity does not suffer from hierarchical matching like in ||3l. Note that 
the Epic result is biased to the value in the first row. 



Removed Outliers in % 


Figure 7. Percentage of removed outliers versus percentage of re¬ 
moved inliers, for an outlier threshold of 5 pixels (We vary e). 


Outlier Filtering Figure [7] shows the percentage of out¬ 
liers that are removed versus the percentage of inliers that 
are removed by different consistency checks on the MPI- 
Sintel training set. Both the 2x consistency check as well 
as the region filter increase the amount of removed outliers 
for a fixed inlier ratio. We also considered using the match¬ 
ing error Ed for outlier filtering, but there is no big gain to 
achieve (see supplementary material). 

4.2. Results 

MPI-Sintel Our results compared to other approaches on 
MPI-Sintel can be seen in Table We clearly outperform 
the original EpicFlow as well as all other approaches. We 
can reduce the EPE on final by nearly 0.5 pixels and nearly 
0.4 pixels on clean. Most of this advance is obtained in 
the non-occluded area but EpicElow also rewards our bet- 

^No backward flow calculated 





























ter input in the occluded areas. On clean we can reduce 
the EPE in non-occluded areas to only 1.056 pixels, which 
is far from the performance of most other approaches. On 
final we can drastically reduce the error of fast motions of 
more than 40 pixels (s40+). Our approach also performs 
well close to occlusion boundaries (dO-10). 


Middlebury On Middlebury we obtain an average rank 
of 38.0 (EpicElow: 52.2) and an average EPE of 0.33 
(EpicElow: 0.39). Our rank is either exactly the same as 
EpicElow (e.g. 69 on Army) or better (e.g. 4 instead of 
53 on Urban). As already discussed in Section [4?T] the EPE 
rank that can be obtained with EpicElow on Middlebury is 
limited, as EpicElow is not designed for such datasets. Nev¬ 
ertheless, we can improve the result on some datasets. 


KITTI On KITTI patch based approaches seem to either 
perform poorly 0, use scale robust features ISll or special 
techniques like plane fitting 0. We think this is because 
image patches of walls and the street are undergoing strong 
scale changes and deformations (due to high view angle). 
With the census transform our results are good for an un¬ 
modified patch based approach but not state-of-the-art (see 
supplementary material). However, as our approach allows 
to exchange data terms as easily as parameters we use the 
more deformation and scale robust SIET flow data term to 
obtain the results on KITTI presented in Table 4 ^ We use 
small patches with r = 3 and r 2 = 2 as the benefit of SIET 
to be scale and deformation robust is otherwise destroyed. 
Due to the small patch sizes we use S = 12 and S 2 = IS 
for the 2. consistency check as runtime tradeoffs. As can 
be seen, we just missed the best approach by 0.01% in > 3 
pixel nocc. Our approach only fails slightly in >3 pixel all. 
However, note that interpolation into the occluded areas is 
performed by EpicElow. There might be better interpolation 
methods for the specific application of planar street scenes. 
Compared to the original EpicElow we are much better. In¬ 
deed, our approach is currently the only one with top per¬ 
formance on Sintel clean and final, as well as KITTI. 

Interesting is that although we have to use very small 
patches on KITTI, our hierarchical approach (with enlarged 
but blurred patches) still works very well. This demon¬ 
strates that the concept of hierarchical matching works even 
in challenging cases when matching large patches fails. 


Runtime Our approach including EpicElow r^uires 18s 
for a frame in MPI-Sintel running on the CPU|jBy using 
patches with r = 6 and no second consistency check we can 
reduce the total time to 10s with an EPE increase of only 
0.13 on final (training set) and even a decrease of 0.02 on 
clean as smaller patches perform better here. On KITTI our 

^ Our approach with SIFT flow also outperforms EpicFlow on the MPI- 
Sintel and Middlebury training sets (but less). See supplementary material. 

^In detail: 3 x 0.4s for kd-tree initialization, 2 x 5s -h 1 x 3s for the 
three Flow Fields, 0.1s for outlier Altering and 3.5s for EpicFlow. 


Method (Final) 

EPE all 

EPE nocc. 

EPE occ. 

dO-10 

s40-i- 

Flow Fields 

5.810 

2.621 

31.799 

4.851 

33.890 

EpicFlow |2^ 

6.285 

3.060 

32.564 

5.205 

38.021 

TF+OFM I20I 

6.727 

3.388 

33.929 

5.544 

39.761 

SparseFlowFusedl281 

7.189 

3.286 

38.977 

5.567 

44.319 

DeepFlow 1321 

7.212 

3.336 

38.781 

5.650 

44.118 

NFF-Local (9) 

7.249 

2.973 

42.088 

4.896 

44.866 

Method (Clean) 

EPE all 

EPE nocc. 

EPE occ. 

dO-10 

s40-i- 

Flow Fields 

3.748 

1.056 

25.700 

2.784 

23.602 

EpicFlow |2^ 

4.115 

1.360 

26.595 

3.660 

25.859 

PH-Flow I35i 

4.388 

1.714 

26.202 

3.612 

27.997 

NNF-LocalO 

5.386 

1.397 

37.896 

2.722 

36.342 


Table 3. Results on MPI-Sintel. Bold results are the best, under¬ 
lined the 2. best, (n)occ = (non) occluded. dO-10 = 0-10 pixels 
from occlusion boundary. s40+ = motions of more than 40 pixels. 


Method 

Rank 

>3 pixel 
nocc. 

>3 pixel 
all 

EPE 

nocc. 

EPE all 

PH-Flow I35l 

1 

5.76 % 

10.57 % 

1.3 px 

2.9 px 

Flow Fields 

2 

5.77 % 

14.01 % 

1.4 px 

3.5 px 

NLTGV-SC 1251 

3 

5.93 % 

11.96% 

1.6 px 

3.8 px 

DDS-DF 13 ll 

4 

6.03 % 

13.08 % 

1.6 px 

4.2 px 

TGV2ADCSIFT |5] 

5 

6.20 % 

15.15 % 

1.5 px 

4.5 px 

EpicFlow |2^ 

13 

7.88 % 

17.08 % 

1.5 px 

3.8 px 


Table 4. Results on KITTI test set. The table rank is the original 
rank excluding non optical flow methods, nocc. = Non-occluded. 


approach with SIET flow needs 23 seconds per image (13 
seconds without PC A). The best approach PPR-Elow needs 
800s and the third best NLTGV-SC 16s, but on the GPU. 

5. Conclusion and Future Work 

In this paper we presented a novel correspondence field 
approach for optical flow estimation. We showed that our 
Plow Pields are clearly superior to ANNE and better suited 
than state-of-the-art descriptor matching approaches, re¬ 
garding optical flow estimation. We also presented ad¬ 
vanced outlier filtering and demonstrated that we can ob¬ 
tain promising optical flow results, utilizing a state-of-the- 
art optical flow algorithm like EpicElow. With our results, 
we hope to inspire the research of dense correspondence 
field estimation for optical flow. So far, sparse descriptor 
matching techniques are much more popular as too little ef¬ 
fort was spent in improving dense techniques. 

In future work, more advanced data terms can be tested. 
Thanks to intensive research mainly in stereo estimation 
there are nowadays e.g. many improvements for the census 
transform |[l0l|25l|24l[221- These can probably be used to 
further improve our approach. Promising is also to estimate 
patch deformations by random search El. It is known that 
this works well for patch normals in 3D reconstruction |[T| . 
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